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Detecting non-sinusoidal periodicities in observational data 
using multi-harmonic periodograms 
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ABSTRACT 

We address the problem of assessing the statistical significance of candidate period- 
icities found using the so-called 'multi-harmonic' periodogram, which is being used 
for detection of non-sinusoidal signals, and is ba sed on the le ast-squares fitting of 
truncated Fourier series. The recent investigation (|Baluevll2008( ) made for the Lomb- 
Scargle periodogram is extended to the more general multi-harmonic periodogram. As 
a result, closed and efficient analytic approximations to the false alarm probability, as- 
sociated with multi-harmonic periodogram peaks, are obtained. The resulting analytic 
approximations are tested under various conditions using Monte Carlo simulations. 
The simulations showed a nice precision and robustness of these approximations. 

Key words: methods: data analysis - methods: statistical - surveys 



1 INTRODUCTION 



The lLombl (jl976l )- IScargle1 (|l982p (hereafter LS) periodogram 
is a well-known powerful tool, which is widely used to search 
for periodicities in observational data. The main idea used 
in the LS periodogram is to perform a least-squares fit of the 
data with a sinusoidal model of the signal and then to check 
how much the resulting weighted r.m.s. have decreased for a 
given signal frequency. The maximum value of the LS peri- 
odogram (i.e., the maximum decrement in the least-squares 
goodness-of-fit measure) corresponds to the most likely fre- 
quency of the periodic signal. This natural idea is quite easy 
to implement in numerical calculations. 

However, random errors in the input data inspire noise 
peaks on the periodogram, so that we can never be com- 
pletely sure that the peak that we actually observed was 
produced by a real periodicity. The common way to as- 
sess the statistical significance of the observed peak is based 
on the associated 'false alarm probability' (hereafter FAP). 
The FAP is the probability that the observed or larger peri- 
odogram peak could be produced by random measurement 
errors. The smaller is FAP, the larger is the statistical signif- 
icance. Given some tolerance value FAP* (say, 1%), we could 
claim that the detected candidate periodicity is statisticaly 
significant (if FAP < FAP*) or is not (if FAP > FAP*). 

From the statistical viewpoint, the FAP is tightly con- 
nected with the probability distribution of periodogram 
maxima, which are calculated within some a priori fixed fre- 
quency segment Q However, even approximate calculation of 



* E-mail: roman@astro.spbu.ru 

1 Speaking more precisely, the periodogram maxima are always 



this distribution is a non-trivial task. It represented a trouble 
for astronomers for about three decades. It is worthwile to 
menti on here, fo r insta n ce, the papers bylHorne fc Baliunasl 



jl986ft: iKoenl (|l990l) ; ISchwarzenberg-Czernvl (Il998allbl) : 



ICumming et al.1 l| 19991 ): ICummind (|2004 ): iFrescura et all 



(2008). Recently, a s ignificant pro gress in this field was at- 
tained in the paper (Balucv 2008), where closed and simul- 
taneously rather efficient approximations of the FAP for the 
LS periodogram are given, basing on results in the theory of 
extreme values of stochastic processes. 

However, periodic signals being dealt with in astronomy 
often are significantly non-sinusoidal. Then the use of the LS 
periodogram is not optimal, since the corresponding periodic 
variation would be fitted inadequately. For instance, it is the 
case for lightcurves of variable stars of several types and for 
radial velocity curves of stars orbited by a planet on an ec- 
centric orbit. Several ways to deal w ith this issue were pro- 
posed (for further references see e.g. ISchwarzenberg-Czernvl 
Il998al lbl). In this paper, we fo cus attention on the so-calle d 
multi-harmonic periodogram ( Sc hwarzenberg-Czernvl l 1996) . 
which is based on the least-squares fitting of tru ncated 
Fourier series. Note that in the paper (|Baluevll200"8T ) a gen- 
eral class of periodograms based on the least-squares data 
fitting was considered as well, but from theoretical positions 
only. Here our aim is to apply these general results to the 
multi-harmonic periodograms. 

The plan of the paper is as follows. In Section [2] we 



calculated over some discrete set of values. However, in practice 
the periodograms are usually plotted on a dense frequency grid, 
which is practically equivalent to a continuous segment. It is the 
case that we consider in the paper. 
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formulate the problem rigorously and introduce the neces- 
sary mathematical definitions. In Section [3] basing on the 
work l|Baluevll200"8h . we derive closed approximations of the 
FAP, associated with multi-harmonic periodogram peaks. 
In Section [4] we use numerical Monte Carlo simulations to 
quantify the accuracy of these analytic approximations. 



2 GENERAL DEFINITIONS 

Let us write down the temporal model of the putative peri- 
odic signal using a trigonometric polynomial of some a priori 
stated degree n: 



n 

fj,(t, 8, f) = (at cos 2nk ft + bk sin 2-irkft) . 



(1) 



where / is the signal frequency and the vector 8 incorpo- 
rates d = 2n Fourier coefficients a,k,bk- Fur ther we adop t 
exactly the same notations as those used in (|Baluevll200"i) . 
Clearly, the model fi is linear: fi(t, 8, f) — 9 ■ <p(t, /), where 
the vector ip(t, f) incorporate the first n harmonics of the 
Fourier basis. In addition to the signal model fi, we define 
the base temporal model \ih (i, Oh ) = 8h • 'fin (t) , which is as- 
sumed to be linear with respect to dn unknown parameters 
On - This base model may represent, for instance, a constant 
or a long-term polynomial (e.g., linear or quadratic) tem- 
poral trend. Therefore, the alternative (full) model is given 
by (itc(t, 8k, f) = ^n(t, On) + K*i 0, /), where 8 K incorpo- 
rates all parameters in 8n and 8. From the viewpoint of the 
statistical tests theory, we need to test the base hypothesis 
Ti : 8 — against the alternative one K, : 8 7^ 0. 

The input dataset consists of TV measurements Xi taken 
at timings ti and having uncertainties Oi. We assume that 
the random errors of the measurements are statistically in- 
dependent and normally distributed. Below w e will deal wit h 
the least-squares periodograms defined in (|Baluev|[2003) . 
These periodograms are based on the linear least-squares 
fitting procedure. The basic one, z(f), represents the half- 
difference 



z(f)=[xn-xl(f)] A 



(2) 



where Xh an d Xk represent the minimum values of the 
X 2 goodness-of-fit statistic, calculated under the two cor- 
responding hypotheses, Ti and K,. Note that under the base 
hypothesis Ti the random quantities Xn an d Xk follow the 
^-distributions with Nh — N — dn and Nk = N — dx; de- 
grees of freedom and thus indeed represent x 2 - var i a tes. The 
periodogram z(f) can be only calculated if the variances Oi 
of the observational errors are known exactly. Usually we 
do not know these variances exactly, and can fix only the 
statistical weights, Wi oc 1/of , so that of = k/wi with the 
common factor k being unconstrained a priori. Therefore, we 
will also consider three modified least-squares periodograms: 



zi(f) = N n 



Xh 
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(3) 



These periodograms do not depend on k and can be calcu- 
lated even if k is unknown. The periodograms and 
Z2(f) represent normalizations of the basic periodogram 



z(f) by the sample variances of the residuals, calculated 
under one of the two hypotheses, Ti or K,. The periodogram 
23(/) is proportional to the logarithm of the likelihood ratio 
statistic. Mor e discussion of these definitions can be found in 
l|Baluevll200sh . A discussion of several issues associated with 
the least-squares interpretation of t he periodograms intro- 
duced a bove can be also found in (|Schwarzenberg-Czernvl 
Il998al lbl: IZechmeister fc Kiirster! 120091 ). The modified peri- 
odograms 21,2,3 sxe unique-value monotonic functions of 
each other and thus are entirely equivalent for the practi- 
cal use. 

The definitions @ imply that the statistical weights 
Wi should be known with sufficient precision, and only 
the proportionality factor is unknown. This framework is 
a usually ad opted for the period analysis of astronomi- 
cal data (e.g. ICilliland fc Baliunaslll987l ; llrwin et al.lll989l : 
IZechmeister fc Kiirster! I2009T ) and we adopt it here. Never- 
theless, sometime s this model may not work well. For in- 
stance, the paper (|Baluevll2009l ) discusses the case in which 
the weights of observations are not known a priori with suf- 
ficient precision. In this case, the traditional multi-harmonic 
periodograms being discussed here may not work well. 

We do not discuss in detail the numerical algorithms for 
calculation of the periodograms introduced. The form of the 
above definitions is more suitable for quantifying the statis- 
tical distributions of the corresponding periodograms. Fast 
numerical algorithms of practical e valuation of the multi- 
harmonic periodog rams are given in (|Schwarzenberg-Czernvl 



harmonic penodogr 
ll99rj : |Palmeii r2009) 



3 FALSE ALARM PROBABILITY 

Let us pick any of the periodograms introduced above, and 
denote it as Z(f). If the frequency of the putative signal 
was known, the false alarm probability FAP s i ng i c (Z), as- 
sociated with the given value Z(f), could be calulated as 
FAP sing i e (Z) = I — P sing i e (Z), where Psingie(Z) is the cumu- 
lative distribution of the corresponding periodogram value, 
calculated under the base hypothesis Ti. It is well-known 
that within simple constant scale factors these distributions 
are x 2 {d), F(d,N)c), a nd B(d, Nk) for the periodograms 
z, 22, an d zi, respect ively (see, e.g.. ISchwarzenberg-Czernvl 
Il998al lbl lB~aluevll2008n . Here the quantities in brackets mark 
the necessary numbers of degrees of freedom. 

When the signal frequency is unknown a priori, we need 
to search for a maximum of Z(f) within some wide frequency 
band [/min, /max]- From now on we will assume, for the sake 
of definiteness, that / m i n = 0. In practice it is a frequent 
case and also this assumption allows us to simplify the for- 
mal expressions. All results presented below can be easily 
extended to the case of arbitrary / m i n > 0. For example, 
we will need to replace certain lower integration limits ap- 
propriately and to change the expressions for the fr equency 
bandw idth from / max to / max — / m in- According to iBaluevl 
(2008), to estimate t he FAP associated with t he observed 
maximum, we use the iDaviesI l| 19771 . Il987l . I2002T ) bound 

FAP max (Z, ,f max ) s; FAP sing i c (Z) + t(Z, / max ), (4) 

Exact expressions for function r are given in !|Baluevll200"8h 
for the general least-squares periodogram z and for its mod- 
ifications 21,2,3 (see eqs. (7) and (8) in that paper). In fact, 
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the right hand side in the inequality ((4]) represents some- 
thing jnore than just an upper bound. It was demonstrated 
bv iBaluevl l|2008h . that in the LS periodogram case the in- 
equality ((4} appears rather sharp, especially for practically 
important low FAP levels. In addition to the bound (|4]), we 
will deal with the following approximation: 



1 — P 

- 1 - 1 max 
-r(Z,/, 



(■^5 /max), 



FAP 

max a 

Pma,x(Z, /max) ~ e ^ '*" lB ' X> P s inglc{Z) . (5) 

As it was discussed in ()Baluevll200Sf ). the formulae ([5]) should 
provide a good approximation to FAP m ax uniformly (i.e., for 
all FAP levels) in the case of small aliasing. Note that the 
approximation ([5]) and the bound (|4|) yield almost coinciding 
results if FAP < 0.1, so that the mentioned property of the 
approximation <(Sj probably will not have direct practical 
application. In this paper, we use ([5} just to plot a reference 
'alias-free' FAP curve. 

Now we need to obtain the function t(z, /max) for our 
special case of the multi-harmonic periodograms. In partic- 
ular, we need to calculate the factor ^4(/ m ax), present in the 
expressions for r. In general, this factor depends in a rather 
unpleasant way on the models of the data, on the time se- 
ries sampling, and on the sequence of the statistical weights 
of observations. To attain some technical simplicity, let us 
firstly assume that, like in the classical LS periodogram, the 
base model is empty: dn = and nn{t) = 0. In this case, 
we need to find firstly the eigenvalues A^ of the dx d matrix 
M, which is defined as: 



Q 



<p®ip, S = <p ® tp'j, 
M = Q _1 (R-S T Q" 1 S). 



(6) 



Here the overline denotes the weighted averaging over the 
time series and the binary operation CS> is the dyadic produc t 
of vectors (x®y = xy T ), see Appendix A in (|BalueyJ[2008). 
The notation tp'f stands for the partial derivative of the vec- 
torial func tion (pit, f) o ver /. We use the expressions from 
the paper (|Davies!l 19871 ) to calculate the factor A(/ ma x). We 
need to combine eqs. (3.2,3.3) an d the unnumb ered equation 
following after the eq. (3.4 ) from jp avics 1987) to obtain the 
formula (7) in the paper (Balucv 2008) with 



.4 



T n + 



df 



dx 

7372' 



(7) 



We need to obtain some more simple, although possi- 
bly approximate, expression for the factor A. To do this, we 
firstly obtain a suitable approximation to the matrix M and 
hence to its eigenvalues A^. After that, we can substitute 
the approximations for A^ to 0, in order to derive the final 
approximation to A(/ max ). We give the associated details, 
as well as an assessment of the practical precision of the 
resulting approximation, in the Appendix [S] Here we give 
only the final result, which seems to be sufficiently accu- 
rate in practice. The matrix M can be approximated by the 
following diagonal block form: 



( I 



\ 



12 











2 2 l 2 . 











. n 2 l 2 



(8) 



Table 1. The constants ct n for a few values of n. 

n 1 2 3 5 8 15 

a„ 1 1.556 1.062 0.136 9.921 • 10" 4 1.037 • 10 -10 



span (T e ff — V47rBt, where Bt is the weighted variance of 
timings ti, see Balucv 2008T). The approximate equality (JSj 
implies that the 2n eigenvalues required are grouped into n 

, n. Finally, 



pairs A2fc-i w A 2 fc 
A(/ max ) « 2n n+ ia n W, 
where W = /max Teff and 



71 = (2n- 1)!! ^ 



(-1) 



n — k !,2n-fl 



)!! (n + k)\(n-k)\ 



(9) 



(10) 



Here the quantity (2n — 1)!! represents the product of all odd 
integers from (2n— 1) downto 1. The numerical values of the 
constants a n for a few values of n are given in Tab le U 

Therefore, using eqs. (7,8) from |Baiuev|[2008), we ob- 
tain for the basic multi-harmonic periodogram z(f) 



(11) 



and for the associated modified periodograms 21,2,3 (/) 

_ 1 %-i 

( '2zlY 2 (\ - 2i±-\ 2 
\n-hJ \ n h J 



(12) 



where b is the 2x2 identity matrix, T c s is the effective time- 



4 NUMERICAL SIMULATIONS 

Speaking in terms of the statistical tests theory, two kinds 
of mistakes can be made in the signal detection problem: 
the false alarm and the false non-detection. Our primary 
goal was to keep the false alarm probability at some a priori 
small levels FAP < FAP„. This is guaranteed by the theo- 
retical inequality (|4]). Now our goal is to characterize (given 
the condition of bounded FAP) the detection power, which 
is provided by the actual precision of the FAP estimation. 
We noted above that the right hand side in @ is expected to 
provide some approximation to the FAP, not just an upper 
bound. However, the error of this approximation depends on 
conditions: in the case when distant periodogram values are 
weakly correlated, this approximation should be precise, and 
in the case when there exist pairs (or more complicated com- 
binations) of strongly correla ted distant pe riodogram values, 
this precision decreases (see [Balucv 2008], Appendix B). In 
practice, the absence of strongly correlated peaks means that 
the periodograms are free from aliases. 

Since we have been already prevented (at the given 
probability FAP,) from false alarms by the upper charac- 
ter of the Davies bound Q, now we are more interested 
in precise approximation of detection thresholds (i.e., such 
critical values 2* that provide FAP(2„) = FAP*) rather than 
of the FAPs themselves, because it is the detection thresh- 
old z* that determine the detection probability. This means 
that we should pay major attention to horizontal deviations 
between the simulated and theoretical FAP curves, rather 
than to vertical ones. 
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Figure 1. Simulated vs. analytic false alarm probability for the 
multi-harmonic periodogram of N = 1000 evenly spaced observa- 
tions. Results for n = 1, 2, 3, 5 are shown as converging bunches of 
curves from left to right on each panel. The frequency bandwidth 
was / max T = 50 (top panel) and 500 (bottom panel). Here and in 
all other similar figures further, the number of Monte Carlo trials 
was about 10 5 for each simulation curve. 



We now proceed to testing the precision of the theoret- 
ical approximations obtained above using Mo nte Carlo sim - 
ulations of FAP max , in the same way as in (|Baluev| [2008). 
When the order of the approximating trigonometric polyno- 
mial grows, the volume of necessary calculations increases 
significantly due to the following reasons: 

(i) The calculations of single values of the multi-harmonic 
periodogram require to solve higher-dimensional linear least- 
squares problem (or to orthogonalize higher-dimensional 
functional bases). 

(ii) As the simulations have shown, the average density of 
peaks on the multi-harmonic periodograms increase roughly 
as 0(n). This requires for the calculations to be performed 
on a more dense frequency grid, in order to obtain enough 
accurate values of periodogram maxima. 

Therefore, our abilities in making numerical simulations are 
severely limited to small n only. 

Firstly let us deal with the case when the time series 
does not produce any aliasing in the classical sense, i.e. 
on the LS periodogram. This is the case of a large num- 
ber of evenly distributed observations. The corresponding 
simulated FAP max curves are shown in Fig. [I] for the peri- 
odograms z(f). We can see that the theorectical approxi- 
mations work quite well. Nevertheless, the small deviations 
for the cases n ^ 2 contrast with the LS case n = 1, for 
which we cannot see any deviation at all. Probably these 
small deviations emerged because of an extra correlation 
of distant periodogram values, caused by the fact that the 
model ^ incorporates several sinusoidal harmonics instead 
of one. Thus the periodogram values at two frequencies /i 
and fi appear correlated if /1//2 ~ p/q for some integers p, q 
not exceeding n (for n = 1 we had only the trivial condition 



fi ~ fz) - Nevertheless, this subtle self-aliasing effect seems 
to have negligible influence on the precision of our analytic 
FAP estimation, at least for n ^ 5. The corresponding er- 
rors of periodogram detection thresholds are about (or less) 
2—3 per cent in these cases. Since the signal amplitude scales 
roughly as ^/z, this results in only ~ 1 per cent inaccuracy 
in the amplitude thresholds. 

When the number of observations decreases and their 
temporal distribution becomes non-uniform, the precision of 
the analytic approximations for n 2 decreases in the same 
manner as for the usual LS periodogram, n = 1. Fig.[2]shows 
a series of simulations for randomly spaced time series and 
for a time series with imposed periodic gapping of timings. 
We can see that the approximations of the threshold levels 
z*, corresponding to FAP* ~ 0.01, still are rather precise 
in many cases, which quite could correspond to a practical 
situation. The precision of the theoretical approximations 
decreases when / max or n grow, when N decreases, or when 
the degree of the non-uniformity of timings distribution in- 
creases. Nevertheless, even in the worst cases the relative 
error of z* (corresponding to FAP, = 0.01) does not exceed 
~ 20 per cent, resulting in only ~ 10 per cent overestimation 
of the corresponding amplitude thresholds. Such loss of pre- 
cision still is not catastrophica l and quite can be tolerated. 

The paper (|Baluevl 120081 ) in fact paid undeservedly 
small attention to the modified LS periodograms zi,2,3(/)- 
It was assumed that the behaviour of their FAP curves 
is similar to the behaviour of the FAP curves of the bas- 
ing LS periodogram. However, they are the modified peri- 
odograms which are usually used in practice. Here we try 
to correct this mistake. It appears that the FAP curves of 
modified periodograms are considerably less sensitive to an 
uneven time series sampling. Consequently, the precision 
of the Davies bound Q and of the alias-free approxima- 
tion © appears significantly better (see Fig. [3J). For the 
modified multi-harmonic periodograms, the random distri- 
bution of timings does not introduce any significant pertur- 
bation of the FAP curve even for iV as small as 30. In this 
case, the FAP curves for the modified periodograms per- 
fectly agree with the alias-free approximation ([5]). For peri- 
odically gapped timings, the precision of the analytic FAP 
estimations improves too. Moreover, this precision does not 
decrease and even seem to increase when the order n or the 
frequency bandwidth / max grow. The reason for such refine- 
ment of the precision of the analytic estimations of the FAP 
for the modified periodograms is unclear. 

It is harder to complete a similar series of Monte Carlo 
simulations for more complicated cases, e.g. with the base 
model [i-H incorporating at least a constant or a linear trend. 
We present only a few examples of such simulations, which 
nevertheless further certify the practical efficiency of the 
closed expresions for the FAP described above (Fig. 3]). Ac- 
tually, it looks that a low-order polynomial trend in the base 



2 One may argue that such undestanding of the notion 'aliasing' 
is not traditional, because the associated effect is not connected 
with uneven time series sampling, and is only a result of an inter- 
play between the main period and its subharmonics. Nevertheless, 
for the sake of a uniform terminology, we name here all 'wrong' 
periodogram peaks as aliases, and the associated phenomenon of 
correlativity of distant periodogram values as aliasing. 
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Figure 2. Simulated vs. analytic false alarm probability for the basic multi-harmonic periodogram z(f). For the panels in the left and 
middle columns, the time series consisted of N = 100 and N = 30 randomly spaced datapoints, respectively. For the panels in the 
right column N = 100 datapoints were clumped in ten evenly spaced groups. Each group consisted of ten points and spanned only 1/50 
fraction of the total time-span (instead of the natural 1/10 fraction). In each panel, four converging bunches of curves from left to right 
correspond to n = 1, 2, 3, 5. For the top raw / max T = 50, for the middle one / max T = 500, and for the bottom one / max T = 5000. 
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Figure 3. Same as in Fig. [2] but for the modified periodogram zi(/). 



model nn does not introduce any visible deviation in the 
simulated FAP curve, at least in this particular case. 

Finally, let us take some realistic time series sampling 
and consider the associated FAP curves and their approxi- 
mations under some realistic conditions. For this purpose, as 



in the paper (|Baluevll200"8h . we use the observational dates 
and standard errors of the high- precision radial v elocity data 
for the stars 51 Peg and 70 Vir (|Naef et aLlbOO^ . The num- 
ber of observations in the first time series is N — 153 and 
in the second one iV = 35. The time-span of these time se- 
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Figure 4. Simulated vs. analytic false alarm probability for the 
modified multi- harmonic (n = 2) periodogram zi(/), constructed 
from N = 100 evenly spaced observations, in the main frequency 
band J milI T = 50. The graph shows four simulated FAP curves 
for different degrees of the polynomial trend in the base model 
fi-ft'- empty base model (dj-t = 0), a constant term (df-i = 1), 
a linear trend (d^ = 2), a quadratic trend (dn = 3). All these 
curves appear almost coinciding. For an intercomparison, we show 
here the theoretical distribution curves for all the modified peri- 
odograms, z\, Zs, and 22 (from left to right). Note that we plot 
them only for the case d-y± = 0, because the similar curves for 
dyi = 1. 2, 3 did not show any visible deviation. 
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Figure 5. Simulated vs. analytic false alarm probability for the 
multi-harmonic (n = 3) periodogram z\ (/) constructed from the 
radial velocity time series of 51 Peg (top) and 70 Vir (bottom). 
The base model fi^i incorporated a free constant term (df-i = 1). 
In each panel, three bunches of curves correspond to P m i n = 
l//max = 100 days, 10 days, and 1 day (from left to right). 



levels FAP > 10^ which can be reliably modelled using 
10 5 Monte Carlo trials. 



5 CONCLUSIONS 

In this paper, previous results by iBaluevI (|2008l ) are ap- 
plied to the case when the model of the signal to be de- 
tected represents a truncated Fourier polynomial. Closed 
analytic expressions for the false alarm probabilities, asso- 
ciated with multi-harmonic periodogram peaks, are given. 
They are tested under various conditions using Monte Carlo 
simulations. The simulations have shown that the accuracy 
of the mentioned theoretical estimations usually is quite 
suitable in practice. Also, these simulations have revealed an 
unexpected (but pleasant) phenomenon: the accuracy of the 
above theoretical approximations of the FAP is considerably 
better for the normalized multi-harmonic periodograms than 
for the basic, purely least-squares, ones. Since in practice the 
observational noise variance is rarely known precisely, they 
are the normalized peridograms that are usually dealt with. 
Therefore, the better behaviour of the FAP curves for the 
normalized periodograms has high practical value. 

The necessary amount of Monte Carlo simulations for 
the multi-harmonic periodogram is bigger than for the LS 
one. It may appear very difficult to obtain a sufficiently pre- 
cise Monte Carlo estimation of the false alarm probability 
even in the case of a single time series. Most likely, for sur- 
veys dealing with large numbers of separate time series, it 
would be impossible to perform the necessary amount of 
Monte Carlo simulations. For example, a single CPU at 2 
GHz would complete all Monte Carlo simulations presented 
above in a few months only. On contrary, the closed theo- 
rectical estimations presented in this paper do not require 
any simulations at all, and simultaneously often have a nice 
accuracy. This indicates that the mentioned estimations rep- 
resent a promising practical tool and may be used in a 
wide variety of astronomical applications, involving search 
for non-sinusoidal periodicities in observational data. The 
corresponding research fields are ranged from the studies of 
variable stars to the studies of extrasolar planetary systems. 
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be calculated easily. The result is given in (|8]), and the eigen- 
values required are approximated as A2*; ~ \2k-1 ~ nT^k 2 . 

It is not hard to check that when our base model fin is 
not empty but contains a free constant term or a low-order 
polynomial drift with free coefficients, the same approxima- 
tion for the matrix M holds true under similar conditions. 
In this case, the base model fi-n appears approximately or- 
thogonal to the signal model fi in the sense that the cross 
averages tpn C2> ip can be neglected in comparison with the 
respective elements of the matrices tpn ® tpn and tp ® tp. 

Coupled with the obtained approximate expressions for 
the eigenvalues A*,, the eq. ([7]) yields the eq. Q with 



27rr ( 



OO 

1 fd l )*L 

n+i)7 V nLi(i+^ 2 )y * 3/2 



(2n - 1)!! 2tt 



^ 1 



n: = i(i+^ 2 fc 2 )/ * 



% (A2) 



APPENDIX A: THE FACTOR A(F M ax) 

The elements of the matrices Q, S, R can be t ransformed in 
the way similar to eqs. (10) in l|Baluevll2008l ). The matrix 
Q will contain the averages of the kind sin kuit and cos kuit. 
The matrix S will contain components tsinkuit, t cos kuit, 
and also i. The matrix R will contain components of the 

1,2,. 



kind t 2 sin kuit, t 2 cos kuit, and also t 2 . Here k 
Therefore, we deal with quantities having the form 



fi s (/) = sinuit, 
A s (f) = t sin cut, 
S s (/) = t 2 sinuit, 



f2 c (/) = cos cut, 



A c (/) = tcOSUlt, 



H c (/) = t 2 COSUlt, 



. ,2n. 



(Al) 



and with the similar overtonic quantities, calculated at the 
frequencies 2/, 3/, . . . , 2nf. Now our goal is to show that 
under certain conditions the quantities fi c , s , A CjS , H ClS have 
small magnitude in comparison with the quantities l,i,t 2 , 
resp., and thus can be neglected. Let us assume that at the 
given frequency / the phases uiU are distributed approxi- 
mately uniformly in the segment [0, 2n]. This means that 
the multipliers cosuit and sinuit in (|A1|) may be consid- 
ered as random quantities. Their values are jumping ran- 
domly in the segment [—1,-1-1], whereas the functions 1, t, 
and t 2 are varying slowly. Therefore, the mentioned sines 
and cosines may be treated as random quantities not corre- 
lated with the timings ti. This quasirandom property allows 
us to write down approximations like sinuit ~ l/s/N and 
t sinuit « (t)(sinuit) ~ i/yN. 

Therefore, all the quantities (| A lp may be expected to 
be negligible (at the given frequency /) when the values 
of fl C|S (/) are small. It is not hard to see that fi(/) = 
Q c (f) + iQ„(f) — e lu ' t (with i being the imaginary unit) rep- 
resents the complex spectral window of the time series and 
the square of its module is the usual spectral window. Typ- 
ically, the spectral window contains a strong narrow peak 
at / = and a series of smaller peaks, corresponding to 
aliasing frequencies. Therefore, in the case when the spectral 
window does not contain any strong peaks at the frequencies 
/, 2/, . . . , 2nf, we can keep in the matrices Q, S, R only the 
terms, which do not contain sines or cosines inside the av- 
eraging operation. In this approximation, the matrix M can 



We can calculate the integral in (|A2|I using the theory of 
functions of a complex variable. Denoting the integrand in 
the last integral in ()A2|) as f(x), we can easily check that 
lim^^oo = (where x is considered as a complex 

variable). This means that we can replace the integration 
line (—00, +00) by a closed contour Cr, representing a semi- 
circle of the radius R — » 00 in the upper complex semiplane. 
Indeed, the integral over the semicircle arc decays at least 
as rapidly as ~ irRf(R) ~ n/R — > when R — > 00, and the 
integral over the diameter of the semi-circle, (-R, R) tends 
to the integral within (—00, +00) that we need to compute. 

The integrand f(x) can be represented as a ratio of two 
algebraic polynomials: f(x) = P(x) /Q(x), where Q(x) = 

rife=i( 1 + a;2fc2 ) and p ( x ) = (Q( x ) ~ 1 )/ x ' 2 is not hard to 
see that P(x) is indeed a polynomial of degree 2n— 2, because 

the free term in Q(x) is unit and hence the denominator x 2 

is reduced). Therefore, the integral over Cr can be expressed 

via the sum of residues of f(x) in the points Xk = i/k, k = 

1,2, ... ,n (with i being the imaginary unit), which represent 

the roots of Q(x) in the upper complex semiplane. That is, 



2ni 



-\-oo 

[ f( x )Ax = ±- lim / f(x)dx = VRes/( Ifc ). (A3) 
J 2m «-><x> J j-^ 



Since the singularities Xk are simple poles, the corresponding 
residues can be evaluated as 



Resf(x k ) = 



P(Xk) = 

Q'(xk) 2iU 

k 2n-l 



j = l..n,j^k(^- 3 2 /k 2 ) 

(-l) n - k k 2n+1 



2in,=i. 



k(k 2 



i(n + k)\{n - fc)! 



(A4) 



The formulae (|A2IA3IA4|I yield the final expression (fTTJ)) . 

Note that alternatively we could use the treatment in- 
volving ellip soidal surface s in multi-dimensional spaces (see 
eq. (B7) by |BalueyJ 2008). This way seems to be less con- 
vinient to obtain exact formulae for a n , but nonetheless it 
yields a simple upper bound 



1 



(n-l)\\ 



1 - 

n ^ 



1 



(n-1)! 



(n + l)(2n + l) 



(A5) 
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Figure Al. The figure shows the precision of the alias-free approximation of the factor A(/ max ). Top-left panel: the graph of the ratio 
of the derivative A'(/ max ) (the inner integral in J7)) to its alias-free approximation 27r n +°- 5 a n T c ff . On an almost horizontal graph, we 
can see a sequence of strong but narrow splashes corresponding to aliasing periods. Bottom-left panel: the similar graph for the function 
A(/ max ) itself. The splashes at the aliasing frequencies exist but are very small and do not produce significant perturbations. The data 
were obtained for n = 3 and d-j^ = 1 (with a free constant term in the model fi-u). The N = 100 timings of the mock input time series 
were periodically gapped with a frequency corresponding to Tf pa 28.5. At this gapping frequency, the folded phases spanned only 10% 
of the full period. Right panels show similar graphs for the case of TV = 30 randomly spaced observations, n = 5 and d-y^ = 1. 



The comparison of this bound with numerical values from This paper has been typeset from a TjjjX/ I^Tj^X file prepared 
Table [T] shows that this bound is remarkably sharp. by the author. 

Formally, the approximation ([9]) was based on certain 
assumptions of negligible aliasing, which we have discussed 
above. Nevertheless, it was demonstrated in (|Baluevll200"8h 
for the LS periodogram, that this approximation of the fac- 
tor A(/ max ) is quite precise in practice, even when the alias- 
ing effects are strong. We may expect the same behaviour 
of A(/ max ) for multi-harmonic periodograms. This is due to 
the integral character of the representation ©. Indeed, the 
aliasing may result in a strong perturbation of the eigen- 
values Afc and hence of the inner integral in (JT}. However, 
these perturbing effects are locked in very narrow frequency 
intervals of the typical width AW ~ 1. After integration 
over a wide frequency range with W ^> 1, the resulting per- 
turbation in the whole integral appear insignificant. This is 
illustrated in Fig. IA1I 

Therefore, the only practically important source of a 
possible inaccuracy of the analytic FAP estimation lies in 
the possible unsharpness of the Davies bound @ itself and 
in the possible inaccuracy of the associated alias-free ap- 
proximation ([5]) itself. 



